Links and Paths through Life Sciences Data Sources

نویسندگان

  • Zoé Lacroix
  • Hyma Murthy
  • Felix Naumann
  • Louiqa Raschid
چکیده

An abundance of biological data sources contain data on classes of scientific entities, such as genes and sequences. Logical relationships between scientific objects are implemented as URLs and foreign IDs. Query processing typically involves traversing links and paths (concatenation of links) through these sources. We model the data objects in these sources and the links between objects as an object graph. We identify a set of interesting properties for links and paths, such as outdegree, image of a link, cardinality of data objects and links, the number of distinct objects reached by some links, etc. Analogous to database cost models, we use statistics from the object graph to develop a framework to estimate the result size for a query on the object graph. Analogous to training and testing, we use sampled data from queries to estimate the result size. We validate our models using data sampled from four NIH/NCBI data sources. Our research provides a foundation for querying and exploring data sources. 1 Querying Interlinked Sources An abundance of biological data sources contain data about scientific entities, such as genes and sequences. Each source may have data on one or more logical classes. Logical relationships between scientific objects are implemented as source links between data sources. Together, they form a graph – the source graph. Each source link represents a collection of object links, each going from a data object in one source to another object, in the same or a different source. An object graph is formed of the data objects and links. Formal definitions are in Sec. 2. Scientists are interested in exploring relationships between scientific objects, e.g., genes and citations. Consider the query “Return all citations of PubMed that are linked to an Omim entry

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Querying Web-Accessible Life Science Sources: Which paths to choose?

Web-accessible life sciences sources are characterized by a complex graph of overlapping sources, and multiple alternate links between sources. A (navigational) query may be answered by traversing multiple alternate paths between a start source and a target source. Each of these paths may have dissimilar benefit, e.g., the cardinality of result objects that are reached in the target source. Pat...

متن کامل

Composition Methods for Link Discovery

The Linked Open Data community publishes an increasing number of data sources on the so-called Data Web and interlinks them to support data integration applications. We investigate how the composition of existing links and mappings can help discovering new links and mappings between LOD sources. Often there will be many alternatives for composition so that the problem arises which paths can pro...

متن کامل

The Relationship between Subjective Evaluation of Stressors and Depression in Menopausal Women: The Mediating Role of Life Satisfaction

Objective: Previous studies have shown that menopausal women are more likely to experience depression. However, there are few studies that investigated the cognitive mechanism that may have a role in developing depression in menopausal women. Thus, the present study aimed to investigate the mediating role of life satisfaction in the relation between subjective evaluation of stressors and depres...

متن کامل

SCS Connector - Quantifying and Visualising Semantic Paths Between Entity Pairs

A key challenge of the Semantic Web lies in the creation of semantic links between Web resources. The creation of links serves as a mean to semantically enrich Web resources, connecting disparate information sources and facilitating data reuse and sharing. As the amount of data on the Web is ever increasing, automated methods to unveil links between Web resources are required. In this paper, we...

متن کامل

Query Planning in the Presence of Overlapping Sources

Navigational queries on Web-accessible life science sources pose unique query optimization challenges. The objects in these sources are interconnected to objects in other sources, forming a large and complex graph, and there is an overlap of objects in the sources. Answering a query requires the traversal of multiple alternate paths through these sources. Each path can be associated with the be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004